Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: Propose in raft node wait the proposal result so we can fail fast while dropping proposal #9137

Merged
merged 1 commit into from
Apr 4, 2018

Conversation

absolute8511
Copy link
Contributor

@absolute8511 absolute8511 commented Jan 12, 2018

While the leader is transferring or some other states which the raft node may drop the proposal, we should fail fast to notify the proposal canceled. (a separate pr followed by #9067)

This will be used to fix issue #8975 and #8977

@codecov-io
Copy link

codecov-io commented Jan 12, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@2aa3dec). Click here to learn what that means.
The diff coverage is 94.87%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #9137   +/-   ##
=========================================
  Coverage          ?   72.47%           
=========================================
  Files             ?      367           
  Lines             ?    31257           
  Branches          ?        0           
=========================================
  Hits              ?    22652           
  Misses            ?     6967           
  Partials          ?     1638
Impacted Files Coverage Δ
raft/node.go 89.24% <94.87%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa3dec...f0dffb4. Read the comment docs.

raft/node.go Outdated
// ProposeWithCancel proposes that data be appended to the log
// and will invoke the cancel() if the proposal is dropped.
// Application can use this to fail fast when the proposal is dropped or cancelled.
ProposeWithCancel(ctx context.Context, cancel context.CancelFunc, data []byte) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to add this? can we improve Propose func so that it would return an error if proposal is dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the Propose just send the proposal to chan, and will not wait. If we change the behavior for Propose to block and return an error it may affect others.

I think that it may be better to add a new method. Any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, cancel function is used to cancels the child outside, like:

ctx, cancel := WithCancel(parent)
pass(ctx)
cancel()

So it is strange for me to see this style here, maybe a callback is better.

Copy link
Contributor Author

@absolute8511 absolute8511 Jan 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the method name needs to be improved. A new propose method name should make the caller known that a cancel callback will be used to get notified when the proposal is canceled by dropping. (I found there is no way to cancel in context.Context, so I use the function context.CancelFunc as callback since It is actually canceled the context.Context. )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the Propose just send the proposal to chan, and will not wait.

that is true. however, step func should be fast and non-blocking most of the time. changing Proposal to wait a bit longer for stepping to finish should have very a little impact if any.

Maybe I missed something. What is your concern specifically?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddontang

exactly. if we want to make it work, it will return either a callback or a chan, which will be closes if canceled.

@siddontang
Copy link
Contributor

Any update? @absolute8511

We need this feature to improve the timeout for Read operation when leader changed.

/cc @xiang90

@absolute8511 absolute8511 changed the title raft: add new ProposeWithCancel for raft node so we can fail fast while dropping proposal raft: Propose in raft node wait the proposal result so we can fail fast while dropping proposal Jan 22, 2018
@absolute8511 absolute8511 force-pushed the raft-proposal-cancel branch 4 times, most recently from ea9e414 to 2505eed Compare January 23, 2018 03:12
@absolute8511
Copy link
Contributor Author

I changed the Propose to wait on the channel to allow it return the error while dropped by raft.

@xiang90

raft/node.go Outdated
@@ -224,10 +224,15 @@ func RestartNode(c *Config) Node {
return &n
}

type proposingMsg struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the message type inside this will not always be proposal.

renaming this to nodeMsg or msgWithResult is better.

raft/node.go Outdated
case m := <-n.recvc:
err := r.Step(m)
if pm.result != nil {
if err == ErrProposalDropped {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we only propagate ErrProposalDropped error? we should propagate back all types of errors.

raft/node.go Outdated
r.Step(m) // raft never returns an error
err := r.Step(m) // raft never returns an error
if pm.result != nil {
if err == ErrProposalDropped {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

raft/node.go Outdated
// filter out response message from unknown From.
if pr := r.getProgress(m.From); pr != nil || !IsResponseMsg(m.Type) {
r.Step(m) // raft never returns an error
err := r.Step(m) // raft never returns an error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the wrong comment

raft/node.go Outdated
ch := n.recvc
if m.Type == pb.MsgProp {
ch = n.propc
}

pm := proposingMsg{m: m, result: nil}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to put nil in result. it would be nil by default.

raft/node.go Outdated
ch := n.recvc
if m.Type == pb.MsgProp {
ch = n.propc
}

pm := proposingMsg{m: m, result: nil}
if wait {
pm.result = make(chan error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make(chan error, 1)

raft/node.go Outdated
propc chan pb.Message
recvc chan pb.Message
propc chan proposingMsg
recvc chan proposingMsg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to change the recvc chan? if the proposal is redirected from other peers, the waiter is actually on another node anyway. there is no easy to propagate back the error even with this change.

probably changing the propc is good enough for the current purpose.

@xiang90
Copy link
Contributor

xiang90 commented Jan 23, 2018

look good to me in general.

we probably want to hook this change up with etcd, and do some benchmarks. i want to make sure this PR wont bring in any significant performance issue.

@absolute8511
Copy link
Contributor Author

Yeah, a benchmark is needed. Should it in this PR or a new one?

@xiang90
Copy link
Contributor

xiang90 commented Jan 24, 2018

Yeah, a benchmark is needed. Should it in this PR or a new one?

Let us do it in the same PR, but in a different commit. Thanks. You can use tools/benchmark to perform benchmarks.

@siddontang
Copy link
Contributor

any update?

@absolute8511
Copy link
Contributor Author

Sorry for the delay, I will start tomorrow after my vacation.

@absolute8511
Copy link
Contributor Author

@xiang90

I looked into the tools/benchmark. What kinds of benchmark do this pr need? I noticed there are some benchmark for write. In this pr I only changed the Propose behavior, so I just re-run the write benchmark to get a result or a new benchmark tool need?

@xiang90
Copy link
Contributor

xiang90 commented Feb 3, 2018 via email

@robin865
Copy link

robin865 commented Feb 6, 2018

So it seems like these changes won't exactly solve this problem? Just a comment, but node.go; step function:

// Step advances the state machine using msgs. The ctx.Err() will be returned,
// if any.
        func (n *node) step(ctx context.Context, m pb.Message) error {
                ch := n.recvc
                if m.Type == pb.MsgProp {
                        ch = n.propc
                }
                select {
                case ch <- m:
                        return nil
                case <-ctx.Done():
                        return ctx.Err()
                case <-n.done:
                        return ErrStopped
                }
        }

When the leader changes, n.propc is now a nil channel. Hence the "case ch <- m:" will block and only the "ctc.Done()" will exit this select. In the changes proposed for this request; I don't see this problem solved. Hence I don't see this allow for "fast failures" in this scenario.

Seems to me we could do something simple here before we send to 'ch';

if ch == nil {
    return Err;
}

@absolute8511
Copy link
Contributor Author

absolute8511 commented Feb 8, 2018

@xiang90

I run the benchmark on my MacBook using the version with and without this pr and got the below result. I run several times and noticed the result is not very stable. Maybe a standard GCE machine is needed.
benchmark command as below:

./benchmark --endpoints=127.0.0.1:2379 --target-leader --conns=1 --clients=1  put --key-size=8 --sequential-keys --total=10000 --val-size=128
./benchmark --endpoints=127.0.0.1:2379 --target-leader --conns=1 --clients=64  put --key-size=8 --sequential-keys --total=80000 --val-size=128
./benchmark --endpoints=127.0.0.1:2379 --target-leader --conns=1 --clients=256  put --key-size=8 --sequential-keys --total=80000 --val-size=128
./benchmark --endpoints=127.0.0.1:2379 --conns=1 --clients=64  put --key-size=8 --sequential-keys --total=80000 --val-size=128
./benchmark --endpoints=127.0.0.1:2379 --conns=1 --clients=256  put --key-size=8 --sequential-keys --total=80000 --val-size=128

Physical machines

MacBook Pro (Retina, 13-inch, Mid 2014)
Processor: 2.6 GHz Intel Core i5
Memory: 8 GB 1600 MHz DDR3

Test without this pr

etcd Cluster

3 etcd at Git commit:

etcd Version: 3.3.0+git
Git SHA: c5532ebbf
Go Version: go1.9.3
Go OS/Arch: darwin/amd64

Performance in writing one single key

key size in bytes number of clients target etcd server write QPS 90th Percentile Latency (ms)
8 1 leader only 104 13.9
8 64 leader only 2310 40.54
8 256 leader only 5212 65.5
8 64 all servers 2215 41.5
8 256 all servers 5410 62.5

Test with this pr

etcd Cluster

3 etcd at Git commit:

etcd Version: 3.3.0+git
Git SHA: c5532ebbf with this pr patched
Go Version: go1.9.3
Go OS/Arch: darwin/amd64

Performance in writing one single key

key size in bytes number of clients target etcd server write QPS 90th Percentile Latency (ms)
8 1 leader only 106 13.8
8 64 leader only 2316 40.1
8 256 leader only 5057 69.6
8 64 all servers 2297 40.7
8 256 all servers 5424 59.7

@absolute8511
Copy link
Contributor Author

absolute8511 commented Mar 1, 2018

@xiang90

I re-run the benchmark on standard GCE machine again and use the hack/benchmark script to do the benchmark.

Physical machines

GCE n1-standard-2(2 vCPU,7.5 GB memory )

  • 1x dedicated local SSD mounted as etcd data directory
  • 7.5 GB memory
  • 2x CPUs

Test without this pr

etcd Cluster

3 etcd at Git commit:

etcd Version: 3.3.0+git
Git SHA: c5532ebbf
Go Version: go1.9.4
Go OS/Arch: linux/amd64

Performance in writing one single key

key size in bytes number of clients target etcd server write QPS 90th Percentile Latency (ms)
64 1 leader only 524 2.2
64 64 leader only 7025 11.8
64 256 leader only 10664 33.0
64 64 all servers 3856 7.2
64 256 all servers 6776 18.9

Test with this pr

etcd Cluster

3 etcd at Git commit:

etcd Version: 3.3.0+git
Git SHA: c5532ebbf + pr
Go Version: go1.9.4
Go OS/Arch: linux/amd64

Performance in writing one single key

key size in bytes number of clients target etcd server write QPS 90th Percentile Latency (ms)
64 1 leader only 582 1.98
64 64 leader only 7307 11.3
64 256 leader only 9803 35.36
64 64 all servers 4645 6.0
64 256 all servers 6726 20.3

@absolute8511 absolute8511 force-pushed the raft-proposal-cancel branch from 068e401 to 5be3085 Compare March 1, 2018 11:59
@siddontang
Copy link
Contributor

Seem that if we don't have many clients, the PR's performance is better, suprised.

@siddontang
Copy link
Contributor

any update? @xiang90

@absolute8511
Copy link
Contributor Author

Are there any other works need to make it merge? @xiang90 @gyuho

raft/node.go Outdated
ch := n.recvc
if m.Type == pb.MsgProp {
ch = n.propc
func (n *node) stepWait(ctx context.Context, m pb.Message, wait bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to stepWithWaitOption. change stepWait to call into stepWithWaitOption(..., true)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

case m := <-n.recvc:
// filter out response message from unknown From.
if pr := r.getProgress(m.From); pr != nil || !IsResponseMsg(m.Type) {
r.Step(m) // raft never returns an error
r.Step(m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we consume the returned error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you comment before:

do we really need to change the recvc chan? if the proposal is redirected from other peers, the waiter is actually on another node anyway. there is no easy to propagate back the error even with this change.
probably changing the propc is good enough for the current purpose.

I leaved this unchanged

@xiang90
Copy link
Contributor

xiang90 commented Mar 30, 2018

@absolute8511 can you please reply to the comment #9137 (comment)?

@siddontang
Copy link
Contributor

@absolute8511

In our test by @nolouch, this still can't fail fast when leader changed sometimes.

@absolute8511
Copy link
Contributor Author

absolute8511 commented Mar 31, 2018

@Rsm10 @xiang90 @siddontang

When the leader changes, n.propc is now a nil channel.

I think this is not always true. And this pr should only solve part of the fail fast. Only proposal dropped by the local Step will fail fast in this pr. Other fail fast issue may need another pr.

If dropped by remote currently we just wait.

@siddontang could you please give some brief test cases which can't fail fast currently, maybe we need some other pr in future to solve other fail fast issue.

@absolute8511 absolute8511 force-pushed the raft-proposal-cancel branch from 5be3085 to bbb9ba9 Compare March 31, 2018 14:36
@siddontang
Copy link
Contributor

@absolute8511

Only proposal dropped by the local Step will fail fast in this pr. Other fail fast issue may need another pr.

You have already explained. If we isolate a follower and rejoin again, the leader will step down, at that time, all the proposed requests can't be notified quickly but have to wait timeout.

@absolute8511 absolute8511 force-pushed the raft-proposal-cancel branch from bbb9ba9 to 76fc899 Compare April 1, 2018 14:45
@absolute8511
Copy link
Contributor Author

@xiang90
The CI result:

Expected commit title format '<package>{", "<package>}: <description>'
Got: 3d504737e add chain core to raft users list

Have no idea what's wrong with my commit title?

@xiang90
Copy link
Contributor

xiang90 commented Apr 1, 2018

@absolute8511 rebase with the current master?

@absolute8511 absolute8511 force-pushed the raft-proposal-cancel branch 2 times, most recently from 8071ab1 to cd15aa9 Compare April 2, 2018 02:47
@absolute8511 absolute8511 force-pushed the raft-proposal-cancel branch from cd15aa9 to f0dffb4 Compare April 3, 2018 03:04
@absolute8511
Copy link
Contributor Author

@xiang90
All done.

@xiang90
Copy link
Contributor

xiang90 commented Apr 4, 2018

lgtm

@guoger
Copy link

guoger commented Mar 19, 2019

@absolute8511 doing some archaeological work here and i'm wondering why wasn't ProposeConfChange changed as part of this PR? thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

7 participants